Prediction of aqueous solubility based on large datasets using several QSPR models utilizing topological structure representation.

نویسندگان

  • Joseph R Votano
  • Marc Parham
  • Lowell H Hall
  • Lemont B Kier
  • L Mark Hall
چکیده

Several QSPR models were developed for predicting intrinsic aqueous solubility, S(o). A data set of 5,964 neutral compounds was sub-divided into two classes, aromatic and non-aromatic compounds. Three models were created with different methods on both data sets: two regression models (multiple linear regression and partial least squares) and an artificial neural network model. These models were based on 3343 aromatic and 1674 non-aromatic compounds for training sets; 938 compounds were used in external validation testing. The range in -log S(o) is -1.6 to 10. Topological structure descriptors were used with all models. A genetic algorithm was used for descriptor selection for regression models. For the artificial neural network (ANN) model, descriptor selection was done with a backward elimination process. All models performed well with r2 values ranging 0.72 to 0.84 in external validation testing. The mean absolute errors in validation ranged from 0.44 to 0.80 for the classes of compounds for all the models. These statistical results indicate a sound ANN model. Furthermore, in a comparison with eight other available models, based on predictions using a validation test set (442 compounds), the artificial neural network model presented in this work (CSLogWS) was clearly superior based on both the mean absolute error and the percentage of residuals less than one log unit. In the ANN model both E-State and hydrogen E-State descriptors were found to be important.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Application of Random Forest and Multiple Linear Regression Techniques to QSPR Prediction of an Aqueous Solubility for Military Compounds.

The relationship between the aqueous solubility of more than two thousand eight hundred organic compounds and their structures was investigated using a QSPR approach based on Simplex Representation of Molecular Structure (SiRMS). The dataset consists of 2537 diverse organic compounds. Multiple Linear Regression (MLR) and Random Forest (RF) methods were used for statistical modeling at the 2D le...

متن کامل

A novel topological descriptor based on the expanded wiener index: Applications to QSPR/QSAR studies

In this paper, a novel topological index, named M-index, is introduced based on expanded form of the Wiener matrix. For constructing this index the atomic characteristics and the interaction of the vertices in a molecule are taken into account. The usefulness of the M-index is demonstrated by several QSPR/QSAR models for different physico-chemical properties and biological activities of a large...

متن کامل

Prediction of boiling point and water solubility of crude oil hydrocarbons using sub-structural molecular fragments method

The quantitative structure–property relationship (QSPR) method is used to develop the correlation between structures of crude oil hydrocarbons (80 compounds) and their boiling point and water solubility. Sub-structural molecular fragments (SMF) calculated from structure alone were used to represent molecular structures. A subset of the calculated fragments selected using stepwise regression (fo...

متن کامل

Chem. Pharm. Bull. 55(4) 669—674 (2007)

tant molecular property, playing a large role in the behavior of compounds in many areas of interest. Given the importance of solubility, a means of prediction based solely on molecular structure should prove a useful tool, as many compounds exist for which the solubility simply is not available. The solubility of chemicals and drugs in the water phase has an essential influence on the extent o...

متن کامل

QSPR Studies on Vapor Pressure, Aqueous Solubility, and the Prediction of Water-Air Partition Coefficients

The vapor pressures and the aqueous solubilities of 411 compounds with a large structural diversity were investigated using a quantitative structure-property relationship (QSPR) approach. A five-descriptor equation with the squared correlation coefficient (R2) of 0.949 for vapor pressure and a six-descriptor equation with R2 of 0.879 for aqueous solubility were obtained. All descriptors were de...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Chemistry & biodiversity

دوره 1 11  شماره 

صفحات  -

تاریخ انتشار 2004